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Abstract. A new method for constructing minimum-redundancy prefix 
codes is described. This method does not explicitly build a Huffman 
tree; instead it uses a property of optimal codes to find the codeword 
length of each weight. The running time of the algorithm is shown to be 
O(nk), which is asymptotically faster than Huffman's algorithm when 
k = o(log n) , where n is the number of weights and k is the number 
of distinct codeword lengths. We also sketch a matching lower bound of 
Q(nk) for any such construction algorithm, indicating that our algorithm 
is asymptotically optimal in terms of n and k. 

1 Introduction 

Minimum-redundancy coding plays an important role in data compression appli- 
cations [14]. Minimum- redundancy prefix codes give the best possible compres- 
sion of a finite text when we use one static code for each symbol of the alphabet. 
This encoding is extensively used in various fields of computer science, such as 
picture compression, data transmission, etc. Therefore, the methods used for 
calculating sets of minimum-redundancy prefix codes that correspond to sets of 
input symbol weights are of great interest [3, 9, 7, 10]. 

The minimum-redundancy prefix code problem is to determine, for a given 
list W — [w\, . . . , w n ] of n positive symbol weights, a list L = [Zi, . . . , l n ] of n 
corresponding integer codeword lengths such that Y17=i = 1> an d J27=i 
is minimized. (Throughout the paper, when Kraft inequality, Y^i=i^~ li — 1> 
is satisfied, it is satisfied with an equality.) Once we have the codeword lengths 
corresponding to a given list of weights, constructing a corresponding prefix code 
can be easily done in linear time using standard techniques. 

Finding a minimum-redundancy code for W = [wi , . . . , w n ] is equivalent to 
finding a binary tree with minimum-weight external path length w(xi)l(xi) 
among all binary trees with leaves x%, . . . , x n , where w(xi) = Wi and l{x{) = k 
is the depth of Xi in the corresponding tree. Hence, if we define a leaf as a 
weighted node, the minimum-redundancy prefix code problem can be defined as 
the problem of constructing an optimal binary tree for a given list of leaves. 



* A preliminary version of the paper appeared in the 23rd Annual Symposium on 
Theoretical Aspects of Computer Science (STACS) 2006 [2]. 



Based on a greedy approach, Huffman's algorithm [6] constructs specific opti- 
mal trees, which are referred to as Huffman trees. The Huffman algorithm starts 
with a list T-L containing n leaves whose values correspond to the given n weights. 
In the general step, the algorithm selects the two nodes with the smallest values 
in the current list of nodes TL and removes them from the list. Next, the removed 
nodes become children of a new internal node, which is inserted in H. To this 
internal node is assigned a value that is equal to the sum of the values of its 
children. The general step repeats until there is only one node in 7i 1 the root 
of the Huffman tree. The internal nodes of a Huffman tree are thereby assigned 
values throughout the algorithm; The value of an internal node is the sum of the 
weights of the leaves of its subtree. The Huffman algorithm requires O(nlogn) 
time and linear space. Van Leeuwen [12] showed that the time complexity of 
Huffman's algorithm can be reduced to 0(n) if the input list is already sorted. 

A distribution-sensitive algorithm is an algorithm whose running time relies 
on how the distribution of the input affects the output [8,11]. For example, a 
related such algorithm is that of Moffat and Turpin [10]; where they show how 
to construct a minimum- redundancy prefix code on a sorted-by-weight alphabet 
of n symbols and r distinct symbol weights in 0(r + rlog(n/r)) time. The al- 
gorithms proposed in [7] are, in a sense, also distribution-sensitive, since their 
additional space complexities depend on the maximum codeword length of the 
output code; The B-LazyHuff algorithm [7] runs in 0(n) time and requires 0(1) 
extra storage to construct a minimum-redundancy prefix code on a sorted-by- 
weight alphabet, where / is the maximum codeword length. 

Throughout the paper, we interchangeably use the terms leaves and weights. 
Unless otherwise stated, we assume that the input weights are unsorted. Unless 
explicitly stated, when mentioning a node of a tree we mean that it is either a 
leaf or an internal node. We number the levels of the tree bottom up starting 
from level 0, i.e. the root will have the highest level number I, its children will 
be at level I — 1, and the leaves furthest from the root will be at level (this 
may be different from the standard level numbering). A weight at level j is then 
given a codeword of length I — j. 

In this paper, we give an asymptotically optimal distribution-sensitive algo- 
rithm for constructing minimum- redundancy prefix codes which runs in 0(nk). 
We use the symbol k to represent the number of different codeword lengths, i.e. 
k is the number of levels that have leaves in the corresponding optimal binary 
tree. We also sketch a matching lower bound of Q(nk) for any such construction 
algorithm. If the sequence of weights is sorted, our algorithm uses C^log 2 *" 1 n 3 ) 
comparisons, which is sub-linear if the value of k is small. 

The paper is organized as follows: In Section 2, we give a property of optimal 
trees corresponding to prefix codes, on which our construction algorithm relies. 
In Section 3, we give the basic algorithm and prove its correctness. We show in 
Section 4 how to implement the algorithm to ensure the distribution-sensitive 
behavior; the bound on the running time we achieve in this section is exponential 
with respect to k. In Section 5, we improve our algorithm to achieve the 0(nk) 
bound. We conclude the paper and sketch the lower bound proof in Section 6. 



2 The Exclusion property 



Consider a binary tree T that corresponds to a list of n weights [w\ , . . . , w n ] and 
has the following properties: 

1. The n leaves of T correspond to the given n weights. 

2. The value of a node equals the sum of the weights of the leaves of its subtree. 

3. For every level of T, let y\, y 2 , ■ ■ ■ be the nodes of that level in non-decreasing 
order with respect to their values, then y 2 i— l and yn are siblings for all i > 1. 

We define the exclusion property [1,2] for T as follows: T has the exclusion 
property if and only if the values of the nodes at a level are not smaller than the 
values of the nodes at the lower levels. 

Lemma 1. [1] Given a prefix code whose corresponding tree T has the afore- 
mentioned properties, the given prefix code is optimal and T is a Huffman tree 
if and only if T has the exclusion property. 

Proof. First, assume that T does not have the exclusion property. It follows 
that there exist two nodes y and y' at levels rj and rf such that rj < r/' and 
value(y) > value(y'). Swapping the subtree of y with the subtree of y' results 
in another tree with a smaller external path length and a different list of levels, 
implying that the given prefix code is not optimal. 

Next, assume that T has the exclusion property. Let [xi, . . . , x„] be the list 
of leaves of T, with w(xi) < w(xi+i). We prove by induction on the number 
of leaves n that T is an optimal binary tree that corresponds to an optimal 
prefix code. The base case follows trivially when n = 2. As a result of the 
exclusion property, the two leaves Xi,X2 must be at the lowest level of T. Also, 
Property 3 of T implies that these two leaves are siblings. Alternatively, there 
is an optimal binary tree with leaves [xi, . . . , x„], where the two leaves Xi,x 2 
are siblings; a fact that is used to prove the correctness of Huffman's algorithm 
[6]. Remove xi,X2 from T, replace their parent with a leaf xi + x 2 whose weight 
equals w{x\) + u>(x 2 ), and let T" be the resulting tree. Since T' has the exclusion 
property, it follows using induction that T" is an optimal tree with respect to its 
leaves [xi + x 2 ,X3, . . . , x„]. Hence, T is an optimal tree and corresponds to an 
optimal prefix code. Property 3 ensures that every two consecutive nodes in the 
non-decreasing order of values are siblings, which is precisely the way Huffman's 
algorithm constructs the codes. It follows that the tree T is a Huffman tree. □ 

The sibling property was introduced by Gallager [5] (see also [13]). The sibling 
property states that if the nodes of a tree that corresponds to a prefix code are 
numbered in a non-decreasing order by their values, then this tree is a Huffman 
tree if and only if every two consecutive nodes (except the root) in this ordering 
are siblings. In fact, the sibling property is equivalent to Property 3 combined 
with the exclusion property. This equivalence can be directly proved, indicating 
the equivalence of a tree T that has the exclusion property to a Huffman tree. 

In general, building a tree T that has the exclusion property by evaluating all 
the internal nodes of T requires i?(nlogn). This follows from the fact that once 



we have built T the sorted order of the input weights will be known, a problem 
that requires fi(n\ogn) in the algebraic decision-tree model. It is crucial to 
mention that we do not have to explicitly construct T in order to find optimal 
codeword lengths. Instead, we only need to find the values of some of, and not 
all, the internal nodes to maintain the exclusion property. 

3 The basic construction method 

Given a list of weights, we build a corresponding optimal tree bottom up. Starting 
with the lowest level (level 0), a weight is momentarily assigned to a level as long 
as its value is less than the sum of the two nodes with the smallest values at 
that level. The Kraft inequality is enforced by making sure that the number of 
nodes at every level is even, and that the number of nodes at the highest level 
containing leaves is a power of two. This will result in some weights changing 
their initially assigned levels when moved upwards. 

3.1 Example 

For the sake of illustration, consider a list with thirty weights: ten weights have 
the value 2, ten have the value 3, five the value 5, and five the value 9. 

To construct the optimal codes, we start by finding the smallest two weights 
in the list; these will have the values 2,2. We now identify all the weights in 
the list with value less than 4, the sum of these two smallest weights. All these 
weights, ten weights of value 2 and ten of value 3, will be momentarily placed at 
level 0. The number of nodes at this level is even, so we move to the next upper 
level (level 1). We identify the smallest two nodes at level 1, amongst the two 
smallest internal nodes resulting from combining nodes of level and the two 
smallest weights among those remaining in the list; these will be the two internal 
nodes 4, 4 whose sum is 8. All the remaining weights with value less than 8 are 
placed at level 1. This level now contains an odd number of nodes: ten internal 
nodes and five weights of value 5. To make this number even, we move the node 
with the largest value to the, still empty, next upper level (level 2). The node 
to be moved, in this case, is an internal node with value 6. Moving an internal 
node one level up implies moving the weights in its subtree one level up. So, the 
subtree consisting of the two weights of value 3 is moved one level up. At the 
end of this stage, level contains ten weights of value 2 and eight weights of 
value 3; level 1 contains two weights of value 3 and five weights of value 5. For 
level 2, the smallest two internal nodes have values 6, 8 and the smallest weight 
in the list has value 9. This means that all the five remaining weights in the list 
will go to level 2. Since we are done with all the weights, we only need to enforce 
the condition that the number of nodes at level 3 is a power of two. Level 2 now 
contains eight internal nodes and five weights, for a total of thirteen nodes. All 
we need to do is to move the three nodes with the largest values, from level 2, 
one level up. The largest three nodes at level 2 are the three internal nodes of 
values 10, 12 and 12. So, we move eight weights of value 3 and two weights of 



value 5 one level up. As a result, the number of nodes at level 3 will be 8. The 
root will then be at level 6. 

The final distribution of weights will be: ten weights of value 2 at level 0; 
ten weights of value 3 and three weights of value 5 at level 1; and the remaining 
weights, two of value 5 and five of value 9, at level 2. The corresponding codeword 
lengths are 6, 5 and 4 respectively. 

3.2 The algorithm 

The idea of the algorithm should be clear. We construct an optimal tree by 
maintaining the exclusion property for all the levels. Once the weights are placed 
at the levels in such a way that the exclusion property is satisfied, the property 
will be satisfied for the internal nodes. Adjusting the number of nodes at each 
level will not affect the exclusion property, since we are always moving the largest 
nodes one level up to a still empty level. A formal description follows. (Note that 
the main ideas of our basic algorithm described in this subsection are pretty 
similar to those of the Lazy- Traversal algorithm described in [7].) 

1. Let W be the list of symbol weights (not necessarily sorted). The smallest 
two weights are found, removed from W, and placed at the lowest level 0; 
Their sum S is computed. The list W is scanned and all weights less than S 
are removed and placed at level 0. If the number of leaves at level is odd, 
the leaf with the largest weight among these leaves is moved to level 1 . 

2. In the general iteration, after moving weights from W to level rj, determine 
the weights from W that will go to level -q + 1 as follows. Find the smallest 
two internal nodes at level r\ + 1, and the smallest two weights remaining in 
W. Find the smallest two values amongst these four, and let their sum be 
S. Scan W for all weights less than S, and move them to level r\ + 1. If the 
number of nodes at level r\ + 1 is odd, move the subtree of the node with the 
largest value among these nodes to level 77 + 2. 

3. When W is exhausted, let m be the number of nodes at the highest level 
that has leaves. Move the 2r iog2 m l — m subtrees of the nodes with the largest 
values, from such level, one level up. 

3.3 Proof of correctness 

To guarantee its optimality following Lemma 1, we need to show that both the 
Kraft inequality and the exclusion property hold for the constructed tree. 

By construction, the number of nodes at every level of the tree is even. At 
Step 3 of the algorithm, if to is a power of 2, no subtrees are moved up and 
Kraft inequality holds. Otherwise, we move 2 r io S2 m 1 — m nodes to the upper 
level, leaving 2m — 2 l" log2 m l nodes at this level other than those of the subtrees 
that have just been moved one level up. Now, the number of nodes at the next 
upper level is to — 2 r io S2 m l — 1 internal nodes resulting from combining pairs of 
nodes at this level, plus the 2 l" log2 m l — m nodes that we have just moved. This 
sums up to 2 fiosa " l l — 1 nodes, that is a power of 2, and Kraft inequality holds. 



Throughout the algorithm, we maintain the exclusion property by making 
sure that the sum of the two nodes with the smallest values is larger than all 
the values of the nodes at this level. When we move a subtree one level up, the 
root of this subtree is the node with the largest value at its level. Hence, all the 
nodes of this subtree at a certain level will have the largest values among the 
nodes of this level. Moving these nodes one level up will not alter the exclusion 
property. We conclude that the resulting tree has the exclusion property. 

4 The detailed construction method 

Up to this point, we have not shown how to evaluate the internal nodes needed by 
our basic algorithm, and how to search within the list W to decide which weights 
are at which levels. The basic intuition behind the novelty of our approach is that 
it does not require evaluating all the internal nodes of the tree corresponding to 
the prefix code, and would thus surpass the @(nlogn) bound for several cases, 
a fact that will be asserted in the analysis. We show next how to implement the 
basic algorithm in a distribution-sensitive behavior. 

4.1 An illustrating example 

The basic idea is clarified through an example with 3n/2 + 2 weights (n is a 
power of 2). Assume that the resulting optimal tree will turn out to have k = 3: 
n leaves at level 0, n/2 at level 1, and two at level log 2 n. Note that the 3n/2 
leaves at levels and 1 combine to produce two internal nodes at level log 2 n. 

In such case, we show how to apply our algorithm such that the optimal 
codeword lengths will be produced in linear time. Determining the weights at 
level can be easily done by finding the smallest two weights and scanning 
through the list of weights. Determining the weights at level 1 can also be easily 
done after finding the smallest two internal nodes at level 1, resulting from the 
smallest four weights from level 0, and scanning through the list of the remaining 
weights. A more involved task that we need to do next is to evaluate the smallest 
node y of the two internal nodes at level log 2 n, which amounts to identifying 
the smallest n/2 nodes amongst the nodes at level 1. In order to be able to 
achieve this in linear time, we need to do it without having to evaluate all n/2 
internal nodes resulting from the pairwise combinations of the n weights at level 
0. We show that this can be done through a simple pruning procedure. The 
nodes at level 1 consist of two sets; one set has n/2 leaves whose weights are 
known and thus their median M can be found in linear time [4], and another 
set containing n/2 internal nodes which are not known but whose median M' 
can still be computed in linear time, by simply finding the two middle weights 
of the n leaves at level and adding them. Assuming without loss of generality 
that M > M', then the larger half of the n/2 weights at level 1 can be safely 
discarded as not contributing to y, and the smaller half of the n weights at level 
are guaranteed to contribute to y. The above step is repeated recursively on 



a problem half the size. This results in a procedure satisfying the recurrence 
T(n) = T(n/2) + 0(n), and hence T(n) = 0(n). 

If the list of weights is already sorted, no comparisons are required to find 
M or M'. The total number of comparisons needed will satisfy the recurrence 
C s (n) = C s (n/2) + 0(1), and hence C s (n) = O(logn). 

4.2 The algorithm 

Let rji = < rj2 < ■ . . rjj be the levels that have already been assigned weights 
at some step of our algorithm (other levels only have internal nodes), rij be the 
count of the leaves so far assigned to level 77,, and Nj = J2l=i n i- 

At this iteration, we are looking forward to find the next upper level f]j+i 
that will be assigned weights by our algorithm. We use the fact that the weights 
that have already been assigned to these j levels are the only weights that may 
contribute to the values of the internal nodes below and up to level t/j+i. 

Consider the internal node Nj at level rjj, where the sum of the counts of 
the weights contributing to level-r/j internal nodes whose values are smaller (or 
larger) than that of Nj is at most 7Vj_i/2. We call Nj the splitting node of the 
internal nodes at level r]j. In other words, if we define the multiplicity of a node 
to be the number of leaves in its subtree, then Nj is the weighted-by-multiplicity 
median within the sorted-by- value sequence of the internal nodes at level r/j. 
Analogously, consider a node Nj (not necessarily an internal node) at level r]j, 
where the sum of the counts of the weights contributing to level-r/j internal nodes 
whose values are smaller (or larger) than that of Nj plus the count of level-c- 
leaves whose values are smaller (or larger) than that of N^ is at most Nj/2. We 
call N ^ the splitting node of all the nodes at level r/j . Informally, Nj is an internal 
node that splits the weights below level rjj in two groups having almost equal 
counts, and N^ is the node that splits the weights below and up to level r]j in 
two groups having almost equal counts. 

Finding the splitting node. Consider the following pruning procedure which 
finds the splitting node N^ of all the nodes at level rjj by utilizing another pro- 
cedure that identifies the splitting nodes of the internal nodes at the same level. 
We find the leaf M with the median weight among the list of the rij weights 
already assigned to level r/j (partition the rij list into two sublists around M), 
and recursively evaluate the splitting node M' of the internal nodes at level r/j 
using the list of the Aj-i weights of the lower levels (partition the Aj-i list into 
two sublists around M'). Comparing the values of M and M', assume without 
loss of generality that M > M'. We conclude that the weights that are larger 
than M must be larger than N^- , and the internal nodes whose values are smaller 
than M' must be smaller than N^. The two corresponding sublists are accord- 
ingly discarded, and a new median M and a new splitting node M' are found 
for the remaining two sublists. The pruning procedure continues until only the 
node N^ remains. As a byproduct, we also know which weights contribute to the 
nodes at level r/j whose values are smaller (or larger) than that of Nj . 



Now, consider the problem of finding the splitting node Kj+i of the internal 
nodes at level r/j+i- Observe that Wj is a descendant of Nj+i, so we start by 
recursively finding the node H^- . Let a be the count of the nodes whose values 
are smaller than Wj at level rjj. Knowing that exactly A = 2 r,j+1 ~ r,] nodes from 
level rjj contribute to every internal node at level r?j+i, we conclude that the 
largest (3 = a — A • [a/X\ nodes among these a nodes, as well as the smallest 
A — /3 — 1 nodes among those whose values are larger than , are the remaining 
nodes contributing to Kj+i. We proceed by finding such nodes, a procedure that 
requires recursively evaluating more splitting nodes at level rjj, in a way that 
will be illustrated in the next subsection. 

To summarize, the splitting node of level rjj+i is evaluated as follows. 
The aforementioned pruning procedure is applied to split the weights already 
assigned to the lower j levels to three groups; those contributing to Wj, those 
contributing to the nodes of level rjj that are smaller than , and those con- 
tributing to the nodes of level rjj that are larger than H^- . The weights contribut- 
ing to are: the weights of the first group, the weights among the second 
group contributing to the largest (3 nodes smaller than Wj , and the weights among 
the third group contributing to the smallest A — (3 — 1 nodes larger than . 

Let T(Nj,j) be the time required to find The total amount of work, in 

all the recursive calls, required to find the medians among the rij weights assigned 
to level rjj is 0(jij). During the pruning procedure to evaluate H^-, the time for 
the i-th recursive call to find a splitting node at level rjj is T(Nj_i/2 z ~ 1 ,j — 1). 
The pruning procedure, therefore, requires up to J2i>i T(Nj_i/2' l ~ 1 , j — 1) + 
0(rij) time. To find the i-th smallest (or largest) nodes among each group which 
constitutes at most Nj_\/2 of the leaves, several calls to evaluate splitting nodes 
are also initiated. The time for the i-th such recursive call is T(Nj_i/2 l ,j — 1), 
for a total of X)i>i T(Nj-i/2 % ,j — 1) + 0(rij) time for each of the two groups 
(see next subsection). Summing up the bounds, the next relations follow: 

T(N 1 ,l) = 0(n 1 ), 

T(Nj,j) ^T^*- 1 ,; - 1) + 2^T(iV j _ 1 /2 i ,j - 1) + 0(n,). 
i>i i>i 

Substitute with T(a,b) < c ■ 4 b a, for a < Nj, b < j, and some big enough 
constant c. Then, 

T(Nj,j) < c ■ 4^ 1 iV,_ 1 (^ 1/2*" 1 + 2 1/2 1 ) + 0{rij), 
i>i i>i 

< c ■ 4 j Nj_i + c ■ tij. 
Using the fact that Nj = Nj-i + rij, then 



T(N j ,j) = 0{4?N j ). 



Finding the t-th smallest (or largest) node. Consider the node 5j at level 
r]j, which has the t-th smallest (or largest) value among the nodes at level rjj. 
The following recursive procedure is used to evaluate 3j . 

As for the case of finding the splitting node, we find the leaf with the median 
weight M among the list of the nj weights already assigned to level rjj, and 
evaluate the splitting node M' of the internal nodes at level rjj (applying the 
above recursive procedure) using the list of the Nj_i leaves of the lower levels. 
Comparing M to M', we can discard one of the four sublists - the two sublists 
of rij leaves and the two sublists of Nj-i leaves - as not contributing to 
Repeating this pruning procedure, we identify the weights that contribute to 3j 
and hence evaluate As a byproduct, we also know which weights contribute 
to the nodes at level rjj whose values are smaller (or larger) than that of Qfj. 

Let T'(Nj,j) be the time required by the above procedure. Then, 

T\N h j) < ^T(7V j _ 1 /2- 1 , 3 1) + OK) = 0{4PNi). 

Finding J7j+i (the next level that will be assigned weights). We start by 
finding the minimum weight w among the weights remaining in W at this point 
of the algorithm, and use this weight to search within the nodes at level rjj in a 
manner similar to binary search. The basic idea is to find the maximum number 
of the smallest- valued nodes at level rjj , such that the sum of their values is less 
than w. We find the splitting node Wj at level rjj, and evaluate the sum of the 
weights contributing to the nodes at that level whose values are smaller than 
that of H^-. Comparing this sum with w, we decide which sublists of the Nj leaves 
to proceed to find its splitting node. At the end of this searching procedure, we 
would have identified the weights contributing to the 7 smallest nodes at level 
rjj, such that the sum of their values is less than w and 7 is maximum (7 > 2). 
We conclude by setting ?7j+i to be equal to rjj + [log 2 7J • 

To prove the correctness of this procedure, consider any level rj, such that 
rjj < i] < r/j + [log 2 7J . The values of the two smallest internal nodes at level 
rj are contributed to by at most 2 r, ~ r,i+1 < 2L log 2fJ < 7 nodes from level rjj. 
Hence, the sum of these two values is less than w. For the exclusion property to 
hold, no weights are assigned to any of these levels. On the contrary, the values 
of the two smallest internal nodes at level rjj + [log 2 7J are contributed to by 
more than 7 nodes from level rjj, and hence their sum is more than w. For the 
exclusion property to hold, at least this weight w is to be assigned to this level. 

The time required by this procedure is the 0(n) time to find the weight 
w among the weights remaining in W, plus the time for the calls to find the 
splitting nodes. Let T"(Nj,j) be the time required by this procedure. Then, 

T"{N h j) <J2T(N 3 /T-\j) + 0(n) = 0(VN j+ n). 

i>l 

Maintaining Kraft inequality. After deciding the value of r]j + i, we need to 
maintain Kraft inequality in order to produce a binary tree corresponding to the 



optimal prefix code. This is accomplished by moving the subtrees of the v nodes 
with the largest values from level rjj one level up. Let // be the number of nodes 
currently at level rjj and let A = 2 nj+1 ~ Jlj , then the number of the nodes to be 
moved up is v = A • \fi/X] — /U. Note that when rfj+i — r] 3 ■ = 1 (as in the case of 
our basic algorithm), then v equals one if \x is odd and zero otherwise. 

To establish the correctness of this procedure, we need to show that both the 
Kraft inequality and the exclusion property hold. For a realizable construction, 
the number of nodes at level r]j has to be even, and if f^+i — rjj 1, the number 
of nodes at level rjj + 1 has to divide A/2. If \x divides A, no subtrees are moved to 
level rjj + 1 and Kraft inequality holds. If li does not divide A, then A • \fi/ A] — ri 
nodes are moved to level rjj + 1, leaving 2ri — A • \ll/X\ nodes at level rjj other 
than those of the subtrees that have just been moved one level up. Now, the 
number of nodes at level r\j + 1 is ri — A • \ri/X]/2 internal nodes resulting from 
the nodes of level rjj, plus the A • |~/x/A] — M nodes that we have just moved. This 
sums up to A • \fi/X]/2 nodes, which divides A/2, and Kraft inequality holds. 
The exclusion property holds following the same argument given in Section 3.3. 

The running time of this procedure is the time needed to find the weights 
contributing to the v nodes with the largest values at level rjj, which is 0(4?Nj). 

Summary of the algorithm. 

1. The smallest two weights are found, moved from W to the lowest level r/i = 0, 
and their sum S is computed. The rest of W is searched for weights less than 
S, which are moved to level as well. 

2. In the general iteration of the algorithm, after assigning weights to j levels, 
perform the following steps: 

(a) Find rjj+i (the next level that will be assigned weights). 

(b) Maintain the Kraft inequality at level rjj (by moving the v subtrees with 
the largest values from this level one level up). 

(c) Find the values of the smallest two internal nodes at level r)j+i, and the 
smallest two weights from those remaining in W. Find the two nodes 
with the smallest values among these four, and let their sum be 5. 

(d) Search the rest of W, and move the weights less than S to level rjj+i. 

3. When W is exhausted, maintain Kraft inequality at the highest level that 
has been assigned weights. 

4.3 Complexity analysis 

Using the bounds deduced for the described steps of the algorithm, we conclude 
that the time required by the general iteration is 0(4? Nj + n). 

To complete the analysis, we need to show the effect of maintaining the Kraft 
inequality on the complexity of the algorithm. Consider the scenario when, as a 
result of moving subtrees one level up, all the weights at a level move up to the 
next level that already had other weights. As a result, the number of levels that 
contain leaves decreases. It is possible that within a single iteration the number 



of such levels decreases to half its value. If this happens for several iterations, 
the amount of work done by the algorithm would have been significantly large 
compared to, k, the actual number of distinct codeword lengths. Fortunately, 
this scenario will not happen quite often. In the next lemma, we bound the 
number of iterations performed by the algorithm by 2k. We also show that at 
any step of the algorithm the number of levels that are assigned weights, and 
hence the number of iterations performed, is at most twice the number of the 
distinct optimal codeword lengths for the weights that have been assigned so far. 

Lemma 2. Consider the set of weights that will have the r-th largest optimal 
codeword length at the end of the algorithm. During the execution of the algo- 
rithm, these weights will be assigned to at most two consecutive (with respect to 
the levels that contain leaves) levels, with level numbers, at most, 2t — 1 and 2r. 
Hence, the number of iterations performed by the algorithm is at most 2k. 

Proof. Consider a set of weights that will turn out to have the same codeword 
length. During the execution of the algorithm, assume that some of these weights 
are assigned to three levels. Let r\j < < rjj +2 be such levels. Since we are 
maintaining the exclusion property throughout the algorithm and since r/j + 1 < 
?7j+2, there will exist some internal nodes at level r/j + 1 whose values are strictly 
smaller than the values of the weights at level 77 3+2 (some may have the same 
value as the smallest weight at level ^+2)- The only way for such weights to catch 
each other at the same tree level would be as a result of moving subtrees up to 
maintain the Kraft inequality. Suppose that, at some point of the algorithm, 
the weights that are currently at level rjj are moved up to catch the weights at 
level rjj+2- It follows that the internal nodes that are currently at level r\j + 1 
will accordingly move to the next upper level of the moved weights. As a result, 
the exclusion property will not hold; a fact that contradicts the behavior of our 
algorithm. It follows that these weights will never be at the same tree level. 

We prove the second part of the lemma by induction. The base case follows 
easily for r = 1. Assume that the argument is true for r — 1. By induction, 
the levels of the weights that will have the (r — l)-th largest optimal codeword 
length will be assigned to the at most 2r — 3 and 2r — 2 levels. From the exclusion 
property, it follows that the weights that have the r-th largest optimal codeword 
length must be at the next upper levels. Using the first part of the lemma, the 
number of such levels is at most two. It follows that these weights are assigned 
to the, at most, 2r — 1 and 2r levels among those assigned weights. 

Hence, the weights with the T-th largest optimal codeword length will be 
assigned within 2r iterations. Since the number of distinct codeword lengths is 
fc, the number of iterations performed by the algorithm is at most 2k. □ 

Using Lemma 2, the time required by our algorithm to assign the set of 
weights whose optimal codeword length is the j-th largest, among all distinct 
lengths, is 0(4 2j n) = 0{l&n). Summing for all such lengths, the total time 
required by our algorithm is X^j=i 0(l&>n) ~ 0(16 k n). 

Consider the case when the list of weights W is already sorted. The following 
theorem follows using similar recursive relations to those in this section. 



Theorem 1. If the list of weights is presorted, constructing minimum-redundancy 
prefix codes can be done using 0(log 2 _1 n 3 ) comparisons. 

Corollary 1. For k < c ■ logn/ loglogn, and any constant c < 0.5, the above 
algorithm requires o(n) comparisons if the list of weights is presorted. 

5 The improved algorithm 

The drawback of the algorithm we described in the previous section is that it uses 
many recursive median- finding calls. We perform the following enhancement to 
the algorithm, if we start with an unsorted list of weights. The basic idea we use 
here is to incrementally process the assigned weights throughout the algorithm 
by partitioning them into unsorted blocks, such that the weights of one block are 
smaller or equal to the smallest weight of the succeeding block. The time bound 
required by the recursive calls improves when handling these shorter blocks. 

The invariant we maintain is that during the execution of the general iteration 
of the algorithm, after assigning weights to j levels, the weights that have already 
been assigned to a level ny , j' < j, are partitioned into blocks each of size at 
most maxjnj' /4 7 ' - -' ,1}, such that the weights of each block are smaller or equal 
to the smallest weight of the succeeding block. To accomplish this invariant, 
once we assign weights to a level, the weights of each block among those already 
assigned to all the lower levels are partitioned into four almost equal blocks, by 
finding the weights at the three quartiles and partitioning around these weights. 
Using Lemma 2, the number of iterations performed by the algorithm is at most 
2k. The amount of work required for this partitioning is 0(n) per iterations, for 
a total of an extra 0(nk) time for the partitioning procedure. 

For j — j' > log 4 Nji , all the weights assigned to level r\y and the lower levels 
are already sorted as a result of the partitioning procedure. We maintain the 
invariant that the internal nodes of all these levels are evaluated, by performing 
the following incremental evaluation procedure once the above condition is sat- 
isfied. The internal nodes at level f\y—\ must have been evaluated in a previous 
iteration, since the above condition must have been satisfied for level r\y-\. What 
we need to do is to merge the sorted sequence of the weights assigned to level 
r\y -\ with the sorted sequence of the internal nodes of level f]j>—\ and evaluate 
the corresponding internal nodes at level r\y . This extra work can be done in a 
total of 0(n) time. As a result, finding the value of a node (the splitting node or 
the t-th smallest node) within any of these j' levels can now be done in constant 
time, as indicated within the recursive relations below. 

The basic step for all our procedures is to find the median weight among the 
weights already assigned to a level r\y . This step can now be done faster. To find 
such median weight, we can identify the block that has such median (the middle 
block) in constant time, then we find the required weight in 0(max{nj< j , 1}), 
which is the size of the block at this level. Let Gj(Nji be the time performed 
by the improved algorithm at and below level j' while assigning the weights at 
level j, where j' < j. The following recursive relations follow: 



G J -(JVi,l) = 0(max{ni/4 J '- 1 ,l}), 
G i {N j ,,j') = 0{l) if j-j'>log 4 A^, 

< ]T G^^/T-^f 1) + 2Y J G j {N r _ 1 /2\j' 1) 

i>l i>l 

+ 0(max{rij' /A J ~ J , 1}) otherwise. 

Substitute with Gj(a,b) < c ■ max{a/4 : '~'', 1}, for a < Nj>, b < j', and some 
big enough constant c. Then, 

Gj(N r ,f) < c • max{7V J v_ 1 /4 J - J ' +1 • + 2j] 1/2 4 ) + n r /4 j - j ', 1} 

i>l i>l 

< c ■ max{(7Vj/_i + rif)/A j ~ j ' , 1}. 
Since A?j> = A^/_i + n^/, it follows that 

Gj{N r ,j') = 0(max{%/4^', 1}). 
The work done to assign the weights at level j is therefore 

G j (N j ,j) = 0(N j ) = 0(n). 

Since the number of iterations performed by the algorithm is at most 2k, by 
Lemma 2. Summing up for these iterations, the running time for performing the 
recursive calls is 0(nk). The next main theorem follows. 

Theorem 2. Constructing minimum-redundancy prefix codes can be done in 
0(nk) time. 

6 Conclusion 

We gave a distribution-sensitive algorithm for constructing minimum-redundancy 
prefix codes, whose running time is 0(nk). For small values of k, this algorithm 
asymptotically improves over other known algorithms that require O(nlogn); 
it is quite interesting to know that the construction of optimal codes can be 
done in linear time when k turns out to be a constant. For small values of k, if 
the sequence of weights is already sorted, the number of comparisons performed 
by our algorithm is asymptotically better than other known algorithms that re- 
quire 0(n) comparisons. For such sorted sequences, the number of comparisons 
required is poly-logarithmic when A: is a constant. 

We have shown in [1] that the verification of a given prefix code for optimality 
requires f2{n\ogn) in the algebraic decision-tree model. Such lower bound was 
illustrated through an example of a prefix code that has k = 0(\ogn) distinct 



codeword lengths. Since the construction is harder than the verification, it follows 
that constructing the solution for the codes of this example requires J?(nlogra). 
This implies a lower bound of f2(nk) for constructing optimal prefix codes, for 
otherwise we could have been able to construct the optimal codes for the example 
in [1] in o(n log n). This implies that our algorithm is asymptotically optimal. 

One remaining question is if it is possible or not to make the algorithm faster 
in practice by avoiding so many recursive calls to a median-finding algorithm. 
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